Overview

In part 1, we established that:

  1. Corn and soybean yields both responded to rotation. Specifically, corn yields were lower in the 2-year rotation, and soybean yields were lower in the two-year rotation and following corn.
  2. Plant health at the seedling and flowering stage predicted both corn and soybean yield - healthy crops at the seedling stage tended to be healthy at flowering, and also yielded more.
  3. Rotation aspects altered plant health as well as yield. Specifically:
    • corn health was lower in the 2-year rotation at both seedling and flowering stages, and marginally lower following soybean at the seedling stages. This suggests that:
      1. soybean suppresses the health of the following corn crop
      2. corn in the 4-year rotation has greater capacity to recover from the soybean suppression
    • soybean health was lower following corn at the seedling stage, and lower in the 2-year rotation and following corn at the flowering stage. This suggests that:
      1. corn suppresses the health of the following soybean crop
      2. soybean in the 4-year rotation can recover from corn suppression.
  4. Nutrient uptake was not a reliable indicator of plant health - only biomass was. This suggests a microbial mechanism.

From the overview of Part 1, we can now pursue this microbial mechanism, and identify candidate microbial characteristics to use as indicators for “good” communities - i.e. putative causal agents. The workflow is:

  1. Determine whether microbial communities at each time point vary in response to rotation, and aspects of rotation that also affect health, using RDA.
    1. Hypothesis All fungal and bacterial communities will respond to rotation.
    2. Hypothesis Previous crop will have a strong affect at seedling, due to transient crop effects.
    3. Hypothesis Rotation length will affect communities at flowering, due to structural community shifts.
  2. Derive community metrics to use as predictors of plant health. Specifically, we want to find features of the community that vary consistently with rotation aspects. These will vary consistently with rotation and also with rotation contrasts.
    1. Hypothesis None save an abundance of pathogens. This is exploratory. Interpretation in step 3 below.
  3. Determe whether these metrics predict plant health independently of rotation. This is mediation analysis.
    1. Hypothesis These microbial predictors predict plant health independently of rotation characteristics, and therefore fully mediate the effect of rotation on plant health.
  4. Incorporate these metrics into the structural equation model from Goal 1, to test whether these metrics do improve our understanding of the scaffold.
    1. Hypothesis By incorporating microbial communities, we improve the explanatory power of the structural model, and therefore have a stronger understanding of the system.

If microbes are a plausible mechanism for suppressed corn and soybean yield in the 2-year rotation, we will move to interpreting the putative causal agents using FUNGuild and picrust2.

  1. Hypothesis Fungal pathogen/mutualist ratios will be higher in the corn/soybean rotation. Sunflower will specifically alter this, per Benitez-Ponce et al (2018).
  2. Hypothsis Bacterial communities will display more PGPR characteristics and more metabolic diversity in the 4-year rotation.

Unlike Part 1, this analysis will be completed on a crop-by-crop basis.

Notes on analyzing high throughput sequencing barcoding data.

We’ll process both 16S and ITS data as compositional data. The characteristics of this are:

  1. The absolute values of individuals are meaningless
  2. The total values within samples are meaningless
  3. Due to the unit-sum constraint (because total values are meaningless, so we work with proportions), all features (taxa) are inherently correlated with each other. Data that are transformed to a unit-sum are ‘closed’.
  4. Therefore, only relative abundance is meaningful, but only in relation to the relative abundance of other components in the sample.

As a result, the analysis needs to preserve a few properties to be robust (see Greenacre and Lewi (2009)):

  1. Subcompositional coherence. Analyzing a subset of the features (reclosed) will give the same results regardless of whether additional features are taken into consideration; adding new features won’t make samples more alike. This is essential because we know we aren’t sampling all taxa in each sample.
  2. Scale invariance. Changing units doesn’t change the result, as long as the constant sum among samples is obeyed. This means that normalization methods don’t matter, as long as we then close the data.
  3. Distributional equivalence. Merging features won’t change results. Therefore, we can cluster taxa at 97% or as actual sequence variants, and distances among samples won’t change.

Weighted log-ratio analysis is the ordination approach to handling such data. Because it requires logarithms, it requires some way to handle zeros in the dataset, ideally that also preserves relationships among observed data. Martin-Fernandez et al. (2015) develop an imputation approach that accomplishes this goal, and implement it in the package, zCompositions.

We still want to remove extremely rare taxa - those that are so rare their varaince can’t be estimated - and samples with so few taxa we can’t estimate variance on sample relationships to other samples. We will retain samples with at least 3 taxa, and taxa appearing in at least 3% of samples - which works out to 5 for this 160 sample dataset.

We also want to account for sampling depth, so samples will be weighted based on log10 total reads.

Load Data

Packages and Custom Scripts

# Custom functions
functions = c('Convenience Functions.R', 
              # 'Ordination Functions.R',
              'Part 2 - Helper Functions.R')
source_functions = function() {
  require(here)
  for (i in functions) source(here('Scripts', i))
}
source_functions()

# Libraries 
libs = c('magrittr', 
         'here',
         'zCompositions', # for zero imputation
         'easyCODA',      # for CLR transformation
         'phyloseq',      # convenient object management
         'lavaan',        # SEM
         'lavaanPlot',    # SEM plots?
         'semPlot',       # ugly but effective plots
         'ggplot2'
         )     
load_libs(libs)  # load libraries, and install if necessary

#pcwOrd
load_pcwOrd = function() {
  here('Scripts', 'pcwOrd_0.2.1.Rdata') %>% 
    load(envir=.GlobalEnv)
}
load_pcwOrd()

# pcwOrd permutations
nperms = 9999

# lavaan option
sem_error = 'bootstrap'
# sem_error = 'default'

Data Sources

in_plot = 'Master Plot Data.csv'

in_bac_counts = 'Master 16S Counts.csv'
in_bac_taxonomy = 'Master 16S Taxonomy.csv'

in_fun_counts = 'Master ITS Counts.csv'
in_fun_taxonomy = 'Master ITS Taxonomy.csv'
in_funguild = 'FUNGuild Taxonomy.guilds.csv' # direct output from FUNGuild converted to .csv in MSExcel

in_obj = c('Contrasts.Rdata',
           'Pretty Names.Rdata')

Plot data, as before.

plot = here('Data', in_plot) %>% 
              read.csv(stringsAsFactors=FALSE)
plot[, 1:14] %<>% lapply(as.factor)
corner(plot)
bac_ps = load_to_phyloseq('BACTERIA_ID',    # Part 2 helper functions. Includes zero replacement.
                          in_bac_counts,
                          in_bac_taxonomy,
                          plot,
                          min_tax=3,
                          min_occurrances=8,
                          min_reads=1)
## No. corrected values:  2418
# remove Bradyrhizobium
rw = attr(bac_ps, 'row_weights')
bac_ps %<>% subset_taxa(Genus != 'Bradyrhizobium')
otu_table(bac_ps) %<>% 
  close_matrix %>% 
  otu_table(taxa_are_rows=FALSE)
attr(bac_ps, 'row_weights') = rw

fun_ps = load_to_phyloseq('FUNGI_ID', 
                          in_fun_counts,
                          in_fun_taxonomy,
                          plot,
                          min_tax=3,
                          min_occurrances=8,
                          min_reads=1) # per sample
## No. corrected values:  15156
phylo_obj = list(`16S` = bac_ps,
                 `ITS` = fun_ps)

print(phylo_obj)
## $`16S`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 370 taxa and 157 samples ]
## sample_data() Sample Data:       [ 157 samples by 53 sample variables ]
## tax_table()   Taxonomy Table:    [ 370 taxa by 6 taxonomic ranks ]
## 
## $ITS
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 433 taxa and 158 samples ]
## sample_data() Sample Data:       [ 158 samples by 53 sample variables ]
## tax_table()   Taxonomy Table:    [ 433 taxa by 7 taxonomic ranks ]

For comparison, the original ITS data (sent through ITSxpress) was:

$ITS
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 201 taxa and 148 samples ]
sample_data() Sample Data:       [ 148 samples by 53 sample variables ]
tax_table()   Taxonomy Table:    [ 201 taxa by 7 taxonomic ranks ]

FUNGuild

funguild = here('Data', in_funguild) %>% 
  read.csv(stringsAsFactors=FALSE, row.names=1) %>% 
  .[rownames(tax_table(fun_ps)), ]
corner(funguild)

Contrasts and nice names for plotting

for (i in in_obj) load(here('Data', i), envir=.GlobalEnv)
lapply(contrasts, round, 3)
## $corn
##                    COWS   CPWS CSCS  CSSwP CSSwSf
## ROTATION_LENGTH  -0.250 -0.250    1 -0.250  -0.25
## FOLLOW_SOY        0.500  0.500    0 -0.500  -0.50
## FOLLOW_SUNFLOWER -0.333 -0.333    0 -0.333   1.00
## 
## $soy
##                  COWS  CPWS CSCS CSSwP CSSwSf
## ROTATION_LENGTH -0.25 -0.25    1 -0.25  -0.25
## FOLLOW_CORN     -0.50 -0.50    0  0.50   0.50
## WITH_PEA        -0.50  0.50    0  0.50  -0.50

Output Structures

plot_out = matrix(nrow=0, ncol=6)
colnames(plot_out) = c('CROP', 'SAMPLING', 'CONTRAST', 'BARCODE', 'BIOMASS', 'SCORE')

stats_out = matrix(nrow=0, ncol=20)
colnames(stats_out) = c('CROP', 'SAMPLING', 'CONTRAST', 'BARCODE', 'ORD_F', 'ORD_DF', 'ORD_P', 'YEAR_F', 'YEAR_DF', 'YEAR_P', 'YEAR_COEF', 'YEAR_SE', 'CONTRAST_COEF', 'CONTRAST_SE', 'CONTRAST_T', 'CONTRAST_P', 'AXIS_COEF', 'AXIS_SE', 'AXIS_T', 'AXIS_P')

# initiate data.frames for storing results
full_ord_results = update_full_ord_results()
contrast_ord_results = update_contrast_ord_results()
taxa_results = update_taxa_results()
ord_ls = list()  # for storing ordinations, including permutations.
sem_ls = list()

Corn

The workflow is, for each subset of crop, sampling time, and community:

  1. See if the community responds to rotation overall. If yes, continue.
  2. Check if the community responds to plant health-predicitive contrasts. If yes, continue.
  3. ID the taxa that most credibly respond to contrasts. If yes, continue.
  4. With bacteria, determine whether these taxa predict plant health independently of contrasts.

Due to low sample sizes (N=8 for each treatment * crop * sampling time combination across years), marginal effects (0.1 > p > 0.05), and taxa that individually respond to contrasts at 40% false discovery rates will be further investigated. This is reasonable as the goal in selecting taxa is to reduce noise in the dataset. This is a low-effort way to separating taxa that seem to respond to rotation from the ones that don’t.

Seedling

Rotation

As with univariate tests, we’ll first test whether our response, the microbial community, responds to the overall treatments. If so, we will test the contrasts that were also predictive of plant health (biomass). The dataset will be reduced to those taxa responding to the contrast

The output for each crop and stage will be a graph of:

  • y = biomass
  • x = microbial contrast score

Plus a sem graph showing the relationships leading to biomass.

First we see if these communities respond to the treatment.

Bacteria

crop = 'corn'
sampling = 'seedling'
constraint = 'ROTATION'
partial = 'YEAR'
barcode = '16S'
contrast = 'None'

nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
# Part 2 - Helper Functions. 
#' Subsets phylo object by crop and sampling. 
#' Builds Y = otu_table(ps_obj), X = sample_data(ps_obj)[constraint], and Z = sample_data(ps_obj)[partial] matrices
#' optionally uses a contrast for X as defined in contrast
#' Performs partial constrained weighted log-ratio analysis with pcwOrd::pcwOrd:
#' - row weights are log10 total reads
#' - column weights are mean relative abundances
#' - otu_table is center-log-ratio-transformed by easyCODA::CLR
#' Produces diagnostic plots and returns the pcwOrd object.
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name) 

# pcwOrd function
perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')

Here is the model PERMANOVA (pseudo-F) test, based on 9999 permutations.

full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
##   CROP SAMPLING BARCODE       RSQ    F_OBS   Df  P_VAL
## 1 corn seedling     16S 0.1085067 1.143856 5,34 0.0638
ord_ls[[nn]] = perms  # structure includes: permutation results, observed data (pcwOrd object).

ROTATION explains 10.851 of corn 16S community variation at seedling (p = 0.064). Therefore, we will test contrasts with the 16S dataset.

Fungi

barcode = 'ITS'

nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name) 

perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
##   CROP SAMPLING BARCODE       RSQ   F_OBS   Df P_VAL
## 1 corn seedling     ITS 0.1584024 2.07696 5,34 1e-04
ord_ls[[nn]] = perms  # structure includes: permutation results, observed data (pcwOrd object).

ROTATION explains 15.8402393 of corn ITS community variation at seedling (p = 10^{-4}). Therefore, we will test contrasts with the ITS dataset.

Contrasts

Follow Soy

Bacteria
contrast = 'FOLLOW_SOY'
barcode = '16S'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP SAMPLING BARCODE   CONTRAST        RSQ     F_OBS   Df  P_VAL
## 1 corn seedling     16S FOLLOW_SOY 0.02237139 0.9274939 2,37 0.7581
ord_ls[[nn]] = perms

16S communities did not respond to ROTATION at seedling (p = 0.76). We will not test it further or reduce features. We’ll just save the scores for plotting.

Fungi
barcode = 'ITS'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)

Our contrast has a negative score on the constrained axis, so we will invert it.

contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP SAMPLING BARCODE   CONTRAST        RSQ    F_OBS   Df  P_VAL
## 1 corn seedling     ITS FOLLOW_SOY 0.03801038 1.829664 2,37 0.0286
ord_ls[[nn]] = perms

ITS communities did respond to ROTATION at seedling (p = 0.029).

Predictive Communities

Here are the axis scores so far. Note we will NOT include 16S in the model.

response = 'AV_DW'

find_ords = c(crop, sampling, contrast) %>% 
  sapply(grepl, names(ord_ls)) %>% 
  apply(1, all)
local_ords = ord_ls[find_ords] %>% 
  lapply(extract2, 'observed')

sem_name = paste(crop, sampling, contrast, sep='_')

#' combine axis (standard) scores plus response, contrast, and partial data into a single data.frame for modeling.
#' contrast_row is the row of contrasts[[crop]]. Must be a matrix.
#cc = contrasts[[crop]][contrast, , drop=FALSE]
mod_df = generate_prediction_df(local_ords, 
                                crop, 
                                sampling, 
                                contrasts[[crop]][contrast, , drop=FALSE], 
                                partial, 
                                constraint, 
                                response, 
                                plot,
                                auto_invert=TRUE)  # if contrast and axis are negatively correlated, for consistency.
head(mod_df)

The SEM models are:

  • no_mediation: A null model where AV_DW responds only to FOLLOW_SOY
  • partial_mediation: A full model where AV_DW responds to both FOLLOW_SOY and the community
  • full_mediation: An intermediate model where AV_DW responds only to the community

In all models, YEAR is included as a predictor of AV_DW, as well.

Errors will be estimated by bootstrapping 1000 times, missing values using maximum likelihood imputation, and the overall models by maximum likelihood.

mod_df %<>% na.omit
no_mediation = {
  "
  # Model
  AV_DW ~ YEAR + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  AXIS_ITS ~~ 0*AV_DW
  # Calculate pathways for easy interpretation
  direct := c
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  
  direct := c
  total := c + (a*b)
  indirect := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS# + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  
  # direct := c
  # total := c + (a*b)
  indirect := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

model_compares = anova(no_mediation, partial_mediation_ITS, full_mediation_ITS) %T>%
  print
## Chi-Squared Difference Test
## 
##                       Df    AIC    BIC   Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_ITS  1 150.44 163.96  0.0049                              
## no_mediation           2 163.60 175.42 15.1584     15.153       1  9.911e-05
## full_mediation_ITS     2 148.84 160.66  0.3981    -14.760       0           
##                          
## partial_mediation_ITS    
## no_mediation          ***
## full_mediation_ITS       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
keep_mod = which.min(model_compares$AIC) %>% 
  rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation, 
                          partial_mediation_ITS = partial_mediation_ITS, 
                          full_mediation_ITS = full_mediation_ITS, 
                          df = mod_df)

The best model is full_mediation_ITS.

mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 26 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          7
##                                                       
##   Number of observations                            40
##   Number of missing patterns                         1
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.398
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.820
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   AV_DW ~                                                               
##     YEAR             -1.598    0.141  -11.356    0.000   -1.598   -0.811
##     AXIS_ITS   (b)   -0.375    0.097   -3.858    0.000   -0.375   -0.376
##   AXIS_ITS ~                                                            
##     FOLLOW_SOY (a)    1.522    0.253    6.023    0.000    1.522    0.689
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             2.397    0.258    9.293    0.000    2.397    2.433
##    .AXIS_ITS         -0.000    0.115   -0.000    1.000   -0.000   -0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             0.195    0.054    3.586    0.000    0.195    0.201
##    .AXIS_ITS          0.512    0.095    5.389    0.000    0.512    0.525
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect         -0.570    0.178   -3.205    0.001   -0.570   -0.259

This is an excellent model.

lavaanPlot(model=get(keep_mod), coefs=TRUE)
Plot

This is a marginal plot, where year effects are removed for clarity of the axis effects.

select_barcode = 'ITS'
#' plot response ~ barcode_axis based on data in mod_df created above.
#' add shapes for partial and colors for constraint
#' response is optionally the residuals of response ~ partial.
#' returns a ggplot object for easy post-hoc formatting
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_effect(select_barcode, response, partial, constraint, contrast, mod_df, partial_response=TRUE, nn)

Candidate Taxa

The taxa that load most credibly onto this axis are:

select_barcode = 'ITS'

is_ITS = select_barcode=='ITS'
if (is_ITS) {
  depth = 'Species'
  guilds = funguild
} else {
  depth = 'Genus'
  guilds = NULL
}

ord_obj = paste(crop, sampling, select_barcode, contrast, sep='_') %>% 
  ord_ls[[.]]

#' Select the features in feature_F of ord_permutations$results that have an adjusted
#' p-value < max_p_adjust. Name these taxa based on tax_table(plylo_obj).
#' Include additional info like guilds and (contribution, sensu Greenacre 2013) scores. 
top_tax = id_top_tax(ord_obj,
                     phylo_obj[[select_barcode]], 
                     metric='feature_F',
                     phylo_depth=depth, 
                     max_p_adj=1,
                     max_p_val=1,
                     guilds=funguild,
                     auto_invert=TRUE) %>% 
  subset(pcAxis1^2 > 0.01)
tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
# subset(top_tax[tt_sort, ], pcAxis1^2 > 0.005)
# top_tax[tt_sort, ]
# tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
# top_tax %<>% .[tt_sort, ] %T>% print# %>% 
  # subset(pcAxis1^2 > 0.005) %T>% 
  # print
# subset(top_tax[tt_sort, ], p_val < 0.05)
taxa_results %<>% update_taxa_results(crop, sampling, select_barcode, contrast, top_tax[tt_sort, ])
##    CROP SAMPLING BARCODE   CONTRAST     ASV          PHYLUM
## 1  corn seedling     ITS FOLLOW_SOY   ASV_9      Ascomycota
## 2  corn seedling     ITS FOLLOW_SOY  ASV_15      Ascomycota
## 3  corn seedling     ITS FOLLOW_SOY  ASV_17      Ascomycota
## 4  corn seedling     ITS FOLLOW_SOY   ASV_8      Ascomycota
## 5  corn seedling     ITS FOLLOW_SOY ASV_183      Ascomycota
## 6  corn seedling     ITS FOLLOW_SOY  ASV_13      Ascomycota
## 7  corn seedling     ITS FOLLOW_SOY  ASV_54      Ascomycota
## 8  corn seedling     ITS FOLLOW_SOY   ASV_3      Ascomycota
## 9  corn seedling     ITS FOLLOW_SOY  ASV_43      Ascomycota
## 10 corn seedling     ITS FOLLOW_SOY  ASV_71      Ascomycota
## 11 corn seedling     ITS FOLLOW_SOY ASV_130   Olpidiomycota
## 12 corn seedling     ITS FOLLOW_SOY  ASV_62   Olpidiomycota
## 13 corn seedling     ITS FOLLOW_SOY  ASV_25      Ascomycota
## 14 corn seedling     ITS FOLLOW_SOY  ASV_69 Chytridiomycota
## 15 corn seedling     ITS FOLLOW_SOY  ASV_42   Olpidiomycota
##                          TAXA
## 1         Unknown Trichoderma
## 2          Unknown Drechslera
## 3        Gibberella acuminata
## 4         Unknown Trichoderma
## 5           Unknown Diaporthe
## 6          Clonostachys rosea
## 7        Unknown Microdochium
## 8          Unknown Helotiales
## 9     Unknown Chaetothyriales
## 10  Unknown Paraphaeosphaeria
## 11         Olpidium brassicae
## 12         Olpidium brassicae
## 13 Dactylonectria macrodidyma
## 14      Unknown Chytridiaceae
## 15         Olpidium brassicae
##                                                               GUILD       AXIS
## 1  Endophyte-Epiphyt-Fungal Parasite-Plant Pathogen-Wood Saprotroph  0.3821456
## 2                                                    Plant Pathogen  0.3485287
## 3                                                    Plant Pathogen  0.2347919
## 4  Endophyte-Epiphyt-Fungal Parasite-Plant Pathogen-Wood Saprotroph  0.1558201
## 5                                          Endophyte-Plant Pathogen  0.1442146
## 6                                                    Plant Pathogen  0.1182411
## 7                                          Endophyte-Plant Pathogen  0.1051201
## 8                                                                 - -0.1045408
## 9                                                                 - -0.1226169
## 10                                             Undefined Saprotroph -0.1324429
## 11                                                   Plant Pathogen -0.1337711
## 12                                                   Plant Pathogen -0.1404569
## 13                                                                - -0.1575138
## 14                                                                - -0.2395906
## 15                                                   Plant Pathogen -0.4016240
##            RSQ      F_OBS  P_VAL     P_ADJ
## 1  0.066253903  4.1951582 0.0510 0.5295923
## 2  0.036786539  1.4130944 0.2418 0.7321636
## 3  0.085186102  4.5277325 0.0450 0.5295923
## 4  0.012115576  1.7593752 0.1985 0.6918306
## 5  0.288689886 17.6989867 0.0002 0.0173200
## 6  0.070450526  3.1288029 0.0843 0.5514736
## 7  0.059048355  2.4321934 0.1210 0.6389390
## 8  0.008158889  0.3096316 0.5929 0.8662785
## 9  0.126679408  5.3997232 0.0230 0.3688519
## 10 0.071240975  2.9536943 0.0944 0.5523676
## 11 0.054676613  2.5409566 0.1171 0.6338037
## 12 0.018106654  1.6325839 0.1921 0.6918306
## 13 0.229494281 11.0374698 0.0026 0.1125800
## 14 0.119196577  7.9016380 0.0096 0.2256158
## 15 0.082827949  6.7648635 0.0143 0.2948524

We can see a number of pathogens associated with soybean as a previous crop. These all “contribute” 1% to the solution.

Rotation Length

Bacteria
contrast = 'ROTATION_LENGTH'
barcode = '16S'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP SAMPLING BARCODE        CONTRAST       RSQ    F_OBS   Df  P_VAL
## 1 corn seedling     16S ROTATION_LENGTH 0.0316659 1.326651 2,37 0.0431
ord_ls[[nn]] = perms

ROTATION explains 3.17 of corn 16S community variation at seedling. This is an intermediate (p = 0.043) effect with a clear visual outcome in both this ordination in the response ordination. We will investigate taxa.

Fungi
barcode = 'ITS'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP SAMPLING BARCODE        CONTRAST        RSQ    F_OBS   Df P_VAL
## 1 corn seedling     ITS ROTATION_LENGTH 0.06618771 3.307242 2,37 1e-04
ord_ls[[nn]] = perms

ROTATION explains 6.62 of corn ITS community variation at seedling. This is a strong (p = 10^{-4}) effect. We will investigate taxa.

Predictive Communities

Here are the axis scores combined with other model parameters.

find_ords = c(crop, sampling, contrast) %>% 
  sapply(grepl, names(ord_ls)) %>% 
  apply(1, all)
local_ords = ord_ls[find_ords] %>% 
  lapply(extract2, 'observed')

sem_name = paste(crop, sampling, contrast, sep='_')
mod_df = generate_prediction_df(local_ords, 
                                crop, 
                                sampling, 
                                contrasts[[crop]][contrast,,drop=FALSE], 
                                partial, 
                                constraint, 
                                response, 
                                plot,
                                auto_invert=TRUE)  # if contrast and axis are negatively correlated, for consistency.
head(mod_df)

The SEM models are:

  • no_mediation: A null model where AV_DW responds only to ROTATION_LENGTH
  • partial_mediation: A full model where AV_DW responds to both ROTATION_LENGTH and the community
  • full_mediation: An intermediate model where AV_DW responds only to the community

In all models, YEAR is included as a predictor of AV_DW, as well.

mod_df %<>% na.omit
no_mediation = {
  "
  # Model
  AV_DW ~ YEAR + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_ITS ~~ 0*AV_DW
  AXIS_16S ~~ 0*AV_DW
  AXIS_ITS ~~ AXIS_16S
  
  # Calculate pathways for easy interpretation
  direct := c
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_full = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + e*AXIS_16S + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_ITS ~~ AXIS_16S
    
  direct := c
  total := c + (a*b) + (d*e)
  indirect_its := a*b
  indirect_16s := d*e
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_16S = {
  "
  AV_DW ~ YEAR + e*AXIS_16S + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_ITS ~~ 0*AV_DW
  AXIS_ITS ~~ AXIS_16S
    
  direct := c
  total := c + (d*e)
  indirect_16s := d*e
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_16S ~~ 0*AV_DW
  AXIS_ITS ~~ AXIS_16S
    
  direct := c
  total := c + (a*b)
  indirect_its := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_full = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + e*AXIS_16S
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_ITS ~~ AXIS_16S
  
  indirect_its := a*b
  indirect_16s := d*e
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_16S = {
  "
  AV_DW ~ YEAR + e*AXIS_16S
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_ITS ~~ AXIS_16S
  AXIS_ITS ~~ 0*AV_DW
  
  indirect_16s := d*e
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_ITS ~~ AXIS_16S
  AXIS_16S ~~ 0*AV_DW
  
  indirect_its := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

model_compares = anova(no_mediation, partial_mediation_full, partial_mediation_16S, partial_mediation_ITS, full_mediation_full, full_mediation_16S, full_mediation_ITS) %T>%
  print
## Chi-Squared Difference Test
## 
##                        Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_full  2 221.39 243.34 0.0008                              
## partial_mediation_16S   3 220.74 241.00 1.3476    1.34675       1     0.2458
## partial_mediation_ITS   3 219.68 239.94 0.2872   -1.06038       0           
## full_mediation_full     3 219.51 239.78 0.1244   -0.16282       0           
## no_mediation            4 219.01 237.59 1.6238    1.49941       1     0.2208
## full_mediation_16S      4 219.82 238.40 2.4339    0.81010       0           
## full_mediation_ITS      4 218.98 237.55 1.5894   -0.84452       0
keep_mod = which.min(model_compares$AIC) %>% 
  rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation, 
                          partial_mediation_ITS = partial_mediation_ITS, 
                          partial_mediation_16S = partial_mediation_16S, 
                          full_mediation_ITS = full_mediation_ITS, 
                          full_mediation_16S = full_mediation_16S, 
                          full_mediation_full = full_mediation_full,
                          df = mod_df)

The best model is full_mediation_ITS. However, this is only by 0.03 AIC points - within rounding error. We will select the no_mediation model as it is the “null”.

mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 26 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         11
##                                                       
##   Number of observations                            40
##   Number of missing patterns                         1
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 1.589
##   Degrees of freedom                                 4
##   P-value (Chi-square)                           0.811
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   AV_DW ~                                                               
##     YEAR             -1.602    0.176   -9.115    0.000   -1.602   -0.812
##     AXIS_ITS   (b)   -0.252    0.063   -3.995    0.000   -0.252   -0.252
##   AXIS_ITS ~                                                            
##     ROTATION_L (a)    1.484    0.130   11.451    0.000    1.484    0.751
##   AXIS_16S ~                                                            
##     ROTATION_L (d)    1.714    0.201    8.505    0.000    1.714    0.868
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .AXIS_ITS ~~                                                           
##    .AXIS_16S         -0.000    0.057   -0.003    0.997   -0.000   -0.001
##  .AV_DW ~~                                                              
##    .AXIS_16S          0.000                               0.000    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             2.404    0.338    7.117    0.000    2.404    2.436
##    .AXIS_ITS         -0.000    0.107   -0.000    1.000   -0.000   -0.000
##    .AXIS_16S         -0.000    0.078   -0.000    1.000   -0.000   -0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             0.270    0.106    2.537    0.011    0.270    0.277
##    .AXIS_ITS          0.425    0.093    4.561    0.000    0.425    0.435
##    .AXIS_16S          0.241    0.047    5.087    0.000    0.241    0.247
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect_its     -0.374    0.091   -4.117    0.000   -0.374   -0.189

This is an excellent model.

lavaanPlot(model=get(keep_mod), coefs=TRUE)

Neither barcode community credibly explains seedling biomass better than ROTATION_LENGTH alone.

Combined Models

Combining both contrast axis scores into a data.frame to build a corn seedling biomass model.

contrast = c('FOLLOW_SOY', 'ROTATION_LENGTH')
barcode = c('ITS', 'ITS')

sem_name = sapply(seq_len(length(barcode)), function(x) paste(crop, sampling, barcode[x], contrast[x], sep='_'))

local_ords = ord_ls[sem_name] %>% 
  lapply(extract2, 'observed')
mod_df = lapply(contrast, function(x) {
  out = generate_prediction_df(local_ords, 
                         crop, 
                          sampling, 
                          contrasts[[crop]][x,,drop=FALSE], 
                          partial, 
                          constraint, 
                          response, 
                          plot,
                          auto_invert=TRUE)   # if contrast and axis are negatively correlated, for consistency.
  is_axis = grepl('AXIS', names(out))
  names(out)[is_axis] %<>% paste(x, sep='_')
  return(out)
}) %>% 
  set_names(contrast)

tt = mod_df[[1]]
if (length(mod_df) > 1) {
  for (i in 2:length(mod_df)) {
    merge_by = 'SAMPLE_ID'
    mm = mod_df[[i]]
    is_axis = grepl('AXIS', names(mm)) %>% 
      names(mm)[.]
    tt = merge(tt, mm[, c(merge_by, is_axis, contrast[i])], by=merge_by, all=TRUE)
  }
}
mod_df = tt
SEM

Here are the combined SEM results

mod_df %<>% na.omit
axis_name = paste0('AXIS_', barcode)
no_mediation = {
  "
  # Model
  AV_DW ~ YEAR + a*CONTRAST1 + b*CONTRAST2
  AXIS1_CONTRAST1 ~ c*CONTRAST1
  AXIS2_CONTRAST2 ~ d*CONTRAST2
  
  # covariances
  AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
  AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
  
  # Calculate pathways for easy interpretation
  direct := a + b
  "
} %>% 
  gsub('CONTRAST1', contrast[1], .) %>% 
  gsub('CONTRAST2', contrast[2], .) %>% 
  gsub('AXIS1', axis_name[1], .) %>% 
  gsub('AXIS2', axis_name[2], .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_full = {
  "
  # Model
  AV_DW ~ YEAR + a*CONTRAST1 + b*CONTRAST2 + e*AXIS1_CONTRAST1 + f*AXIS2_CONTRAST2
  AXIS1_CONTRAST1 ~ c*CONTRAST1
  AXIS2_CONTRAST2 ~ d*CONTRAST2
  
  # covariances
  AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
  AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
  
  # Calculate pathways for easy interpretation
  direct := a + b
  indirect_CONTRAST1 := c*e
  indirect_CONTRAST2 := d*f
  total_CONTRAST1 := a + (c*e)
  total_CONTRAST2 := b + (d*f)
  "
} %>% 
  gsub('CONTRAST1', contrast[1], .) %>% 
  gsub('CONTRAST2', contrast[2], .) %>% 
  gsub('AXIS1', axis_name[1], .) %>% 
  gsub('AXIS2', axis_name[2], .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_full = {
  "
  # Model
  AV_DW ~ YEAR + e*AXIS1_CONTRAST1 + f*AXIS2_CONTRAST2
  AXIS1_CONTRAST1 ~ c*CONTRAST1
  AXIS2_CONTRAST2 ~ d*CONTRAST2
  
  # covariances
  AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
  AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
  
  # Calculate pathways for easy interpretation
  #direct := a + b
  indirect_CONTRAST1 := c*e
  indirect_CONTRAST2 := d*f
  #total_CONTRAST1 := a + (c*e)
  #total_CONTRAST2 := b + (d*f)
  "
} %>% 
  gsub('CONTRAST1', contrast[1], .) %>% 
  gsub('CONTRAST2', contrast[2], .) %>% 
  gsub('AXIS1', axis_name[1], .) %>% 
  gsub('AXIS2', axis_name[2], .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_follow_full_length = {
  "
  # Model
  AV_DW ~ YEAR + a*CONTRAST1 + e*AXIS1_CONTRAST1 + f*AXIS2_CONTRAST2
  AXIS1_CONTRAST1 ~ c*CONTRAST1
  AXIS2_CONTRAST2 ~ d*CONTRAST2
  
  # covariances
  AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
  AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
  
  # Calculate pathways for easy interpretation
  direct := a#
  indirect_CONTRAST1 := c*e
  indirect_CONTRAST2 := d*f
  total_CONTRAST1 := a + (c*e)
  "
} %>% 
  gsub('CONTRAST1', contrast[1], .) %>% 
  gsub('CONTRAST2', contrast[2], .) %>% 
  gsub('AXIS1', axis_name[1], .) %>% 
  gsub('AXIS2', axis_name[2], .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_follow_partial_length = {
  "
  # Model
  AV_DW ~ YEAR + b*CONTRAST2 + e*AXIS1_CONTRAST1 + f*AXIS2_CONTRAST2
  AXIS1_CONTRAST1 ~ c*CONTRAST1
  AXIS2_CONTRAST2 ~ d*CONTRAST2
  
  # covariances
  AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
  AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
  
  # Calculate pathways for easy interpretation
  direct := b
  indirect_CONTRAST1 := c*e
  indirect_CONTRAST2 := d*f
  total_CONTRAST2 := b + (d*f)
  "
} %>% 
  gsub('CONTRAST1', contrast[1], .) %>% 
  gsub('CONTRAST2', contrast[2], .) %>% 
  gsub('AXIS1', axis_name[1], .) %>% 
  gsub('AXIS2', axis_name[2], .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_follow_none_length = {
  "
  # Model
  AV_DW ~ YEAR + b*CONTRAST2 + e*AXIS1_CONTRAST1
  AXIS1_CONTRAST1 ~ c*CONTRAST1
  AXIS2_CONTRAST2 ~ d*CONTRAST2
  
  # covariances
  AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
  AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
  
  # Calculate pathways for easy interpretation
  direct := b
  indirect_CONTRAST1 := c*e
  "
} %>% 
  gsub('CONTRAST1', contrast[1], .) %>% 
  gsub('CONTRAST2', contrast[2], .) %>% 
  gsub('AXIS1', axis_name[1], .) %>% 
  gsub('AXIS2', axis_name[2], .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_follow_only = {
  "
  # Model
  AV_DW ~ YEAR + e*AXIS1_CONTRAST1
  AXIS1_CONTRAST1 ~ c*CONTRAST1
  AXIS2_CONTRAST2 ~ d*CONTRAST2
  
  # covariances
  AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
  AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
  
  # Calculate pathways for easy interpretation
  indirect_CONTRAST1 := c*e
  "
} %>% 
  gsub('CONTRAST1', contrast[1], .) %>% 
  gsub('CONTRAST2', contrast[2], .) %>% 
  gsub('AXIS1', axis_name[1], .) %>% 
  gsub('AXIS2', axis_name[2], .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

model_compares = anova(full_follow_partial_length, partial_follow_full_length, full_mediation_full, partial_mediation_full, no_mediation, full_follow_none_length, full_follow_only) %T>%
  print
## Chi-Squared Difference Test
## 
##                            Df    AIC    BIC  Chisq Chisq diff Df diff
## partial_mediation_full      4 226.72 250.37 15.072                   
## full_follow_partial_length  5 224.74 246.70 15.091     0.0188       1
## partial_follow_full_length  5 229.24 251.20 19.592     4.5018       0
## full_mediation_full         6 227.49 247.75 19.838     0.2457       1
## no_mediation                6 232.61 252.88 24.961     5.1228       0
## full_follow_none_length     6 224.32 244.59 16.670    -8.2909       0
## full_follow_only            7 225.71 244.29 20.064     3.3946       1
##                            Pr(>Chisq)  
## partial_mediation_full                 
## full_follow_partial_length    0.89089  
## partial_follow_full_length             
## full_mediation_full           0.62014  
## no_mediation                           
## full_follow_none_length                
## full_follow_only              0.06541 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
keep_mod = which.min(model_compares$AIC) %>% 
  rownames(model_compares)[.]
if (length(sem_name) > 1) sem_name = paste(crop, sampling, paste(unique(barcode), collapse='_'), 'COMBINED', sep='_')
sem_ls[[sem_name]] = list(full_follow_partial_length = full_follow_partial_length,
                          partial_follow_full_length = full_follow_partial_length, 
                          full_mediation_full = full_follow_partial_length, 
                          partial_mediation_full = full_follow_partial_length, 
                          no_mediation = full_follow_partial_length, 
                          full_follow_none_length = full_follow_partial_length, 
                          full_follow_only = full_follow_partial_length,
                          df = mod_df)

The best model is:

lavaanPlot(model=get(keep_mod), coefs=TRUE)

With a poor model fit:

anova(get(keep_mod))

So we will use just the FOLLOW_SOY model.

Plot
select_barcode = 'ITS'
#' plot response ~ barcode_axis based on data in mod_df created above.
#' add shapes for partial and colors for constraint
#' response is optionally the residuals of response ~ partial.
#' returns a ggplot object for easy post-hoc formatting

nn = keep_mod
pp = c(partial, 'ROTATION_LENGTH')
plot_effect(select_barcode, response, pp, constraint, contrast[1], mod_df, partial_response=TRUE, nn)

Flowering

Rotation

First we see if these communities respond to the treatment.

Bacteria

crop = 'corn'
sampling = 'flowering'
constraint = 'ROTATION'
partial = 'YEAR'
barcode = '16S'
contrast = 'None'

nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name) 

perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
##   CROP  SAMPLING BARCODE       RSQ    F_OBS   Df  P_VAL
## 1 corn flowering     16S 0.1258768 1.267429 5,32 0.0037
ord_ls[[nn]] = perms  # structure includes: permutation results, observed data (pcwOrd object).

ROTATION explains 12.6 of corn 16S community variation at flowering (p = 0.0037). Therefore, we will test contrasts with the 16S dataset.

Fungi

barcode = 'ITS'

nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name) 

perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
##   CROP  SAMPLING BARCODE       RSQ    F_OBS   Df  P_VAL
## 1 corn flowering     ITS 0.1386246 1.601597 5,34 0.0018
ord_ls[[nn]] = perms  # structure includes: permutation results, observed data (pcwOrd object).

ROTATION explains 13.9 of corn ITS community variation at flowering (p = 0.0018). Therefore, we will test contrasts with the ITS dataset.

Contrasts

Rotation Length

Bacteria
contrast = 'ROTATION_LENGTH'
barcode = '16S'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP  SAMPLING BARCODE        CONTRAST        RSQ    F_OBS   Df  P_VAL
## 1 corn flowering     16S ROTATION_LENGTH 0.03363979 1.327731 2,35 0.0522
ord_ls[[nn]] = perms

ROTATION_LENGTH explains 3.36 of corn 16S community variation at flowering (p = 0.052). We will investigate taxa.

Fungi
barcode = 'ITS'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP  SAMPLING BARCODE        CONTRAST        RSQ    F_OBS   Df  P_VAL
## 1 corn flowering     ITS ROTATION_LENGTH 0.03785292 1.674345 2,37 0.0327
ord_ls[[nn]] = perms

ROTATION_LENGTH explains 3.79 of corn ITS community variation at flowering. This is a strong (p = 0.033) effect. We will investigate taxa.

Predictive Community

find_ords = c(crop, sampling, contrast) %>% 
  sapply(grepl, names(ord_ls)) %>% 
  apply(1, all)
local_ords = ord_ls[find_ords] %>% 
  lapply(extract2, 'observed')

sem_name = paste(crop, sampling, contrast, sep='_')
mod_df = generate_prediction_df(local_ords, 
                                crop, 
                                sampling, 
                                contrasts[[crop]][contrast, , drop=FALSE], 
                                partial, 
                                constraint, 
                                response, 
                                plot,
                                auto_invert=TRUE)  # if contrast and axis are negatively correlated, for consistency.
# Add T1 biomass
seed_bio = subset(plot, SAMPLING=='seedling' & CROP=='corn', select=c('SAMPLE_ID', 'AV_DW'))
colnames(seed_bio)[2] = 'T1_AV_DW'
seed_bio$SAMPLE_ID %<>% as.character %>% 
  strsplit('_') %>% 
  sapply(function(x) paste(x[1:3], collapse='_'))
mod_df$SID = mod_df$SAMPLE_ID %>% 
  as.character %>% 
  strsplit('_') %>% 
  sapply(function(x) paste(x[1:3], collapse='_'))

mod_df %<>% merge(seed_bio, by.x='SID', by.y='SAMPLE_ID')

head(mod_df)

The SEM models are:

  • no_mediation: A null model where AV_DW responds only to ROTATION_LENGTH
  • partial_mediation: A full model where AV_DW responds to both ROTATION_LENGTH and the community
  • full_mediation: An intermediate model where AV_DW responds only to the community

In all models, YEAR is included as a predictor of AV_DW, as well.

mod_df %<>% na.omit
no_mediation = {
  "
  # Model
  AV_DW ~ YEAR + c*CONTRAST + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_ITS ~~ 0*AV_DW
  AXIS_16S ~~ 0*AV_DW
  AXIS_ITS ~~ 0*AXIS_16S
  
  # Calculate pathways for easy interpretation
  direct := c
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_full = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + e*AXIS_16S + c*CONTRAST + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  #AXIS_ITS ~~ AXIS_16S
    
  direct := c
  total := c + (a*b) + (d*e)
  indirect_its := a*b
  indirect_16s := d*e
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_16S = {
  "
  AV_DW ~ YEAR + e*AXIS_16S + c*CONTRAST + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_ITS ~~ 0*AV_DW
    
  direct := c
  total := c + (d*e)
  indirect_16s := d*e
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  AXIS_16S ~~ 0*AV_DW
    
  direct := c
  total := c + (a*b)
  indirect_its := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_full = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + e*AXIS_16S + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  #AXIS_ITS ~~ AXIS_16S
  
  indirect_its := a*b
  indirect_16s := d*e
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_16S = {
  "
  AV_DW ~ YEAR + e*AXIS_16S + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  #AXIS_ITS ~~ AXIS_16S
  AXIS_ITS ~~ 0*AV_DW
  
  indirect_16s := d*e
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
  AXIS_16S ~ d*CONTRAST
  
  # covariances
  #AXIS_ITS ~~ AXIS_16S
  AXIS_16S ~~ 0*AV_DW
  
  indirect_its := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

model_compares = anova(no_mediation, partial_mediation_full, partial_mediation_16S, partial_mediation_ITS, full_mediation_full, full_mediation_16S, full_mediation_ITS) %T>%
  print
## Chi-Squared Difference Test
## 
##                        Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)  
## partial_mediation_full  5 219.75 241.04 2.5267                                
## partial_mediation_16S   6 217.89 237.54 2.6636     0.1369       1    0.71142  
## partial_mediation_ITS   6 222.46 242.11 7.2329     4.5693       0             
## full_mediation_full     6 217.76 237.42 2.5369    -4.6961       0             
## no_mediation            7 220.47 238.48 7.2378     4.7009       1    0.03015 *
## full_mediation_16S      7 215.91 233.92 2.6763    -4.5616       0             
## full_mediation_ITS      7 222.50 240.51 9.2672     6.5909       0             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
keep_mod = which.min(model_compares$AIC) %>% 
  rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation, 
                          partial_mediation_ITS = partial_mediation_ITS, 
                          partial_mediation_16S = partial_mediation_16S, 
                          full_mediation_ITS = full_mediation_ITS, 
                          full_mediation_16S = full_mediation_16S, 
                          full_mediation_full = full_mediation_full,
                          df = mod_df)

The best model is full_mediation_16S. ITS models are the worst - worse than no_mediation.

mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 33 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         11
##                                                       
##   Number of observations                            38
##   Number of missing patterns                         1
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 2.676
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.913
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   AV_DW ~                                                               
##     YEAR             -0.922    0.316   -2.921    0.003   -0.922   -0.462
##     AXIS_16S   (e)   -0.281    0.087   -3.244    0.001   -0.281   -0.278
##     T1_AV_DW          3.881    2.068    1.877    0.061    3.881    0.366
##   AXIS_ITS ~                                                            
##     ROTATION_L (a)    1.618    0.209    7.733    0.000    1.618    0.794
##   AXIS_16S ~                                                            
##     ROTATION_L (d)    1.601    0.335    4.773    0.000    1.601    0.786
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .AV_DW ~~                                                              
##    .AXIS_ITS          0.000                               0.000    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             0.661    0.822    0.805    0.421    0.661    0.664
##    .AXIS_ITS          0.032    0.102    0.313    0.754    0.032    0.032
##    .AXIS_16S          0.032    0.109    0.289    0.773    0.032    0.032
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             0.247    0.062    3.988    0.000    0.247    0.249
##    .AXIS_ITS          0.359    0.086    4.168    0.000    0.359    0.369
##    .AXIS_16S          0.372    0.094    3.968    0.000    0.372    0.382
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect_16s     -0.449    0.168   -2.674    0.007   -0.449   -0.218
lavaanPlot(model=get(keep_mod), coefs=TRUE)
Plot
select_barcode = '16S'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_effect(select_barcode, response, partial, constraint, contrast, mod_df, partial_response=TRUE, nn)

Candidate Taxa

The taxa that load most credibly onto this axis are:

select_barcode = '16S'

is_ITS = select_barcode=='ITS'
if (is_ITS) {
  depth = 'Species'
  guilds = funguild
} else {
  depth = 'Genus'
  guilds = NULL
}

ord_obj = paste(crop, sampling, select_barcode, contrast, sep='_') %>% 
  ord_ls[[.]]

top_tax = id_top_tax(ord_obj,
                     phylo_obj[[select_barcode]], 
                     metric='feature_F',
                     phylo_depth=depth, 
                     max_p_adj=1,
                     max_p_val=1,
                     guilds=NULL,
                     auto_invert=TRUE) %>% 
  subset(pcAxis1^2 > 0.01)
tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
# subset(top_tax[tt_sort, ], pcAxis1^2 > 0.005)
# top_tax[tt_sort, ]
# tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
# top_tax %<>% .[tt_sort, ] %T>% print# %>% 
  # subset(pcAxis1^2 > 0.005) %T>% 
  # print
# subset(top_tax[tt_sort, ], p_val < 0.05)
taxa_results %<>% update_taxa_results(crop, sampling, select_barcode, contrast, top_tax[tt_sort, ])
##    CROP  SAMPLING BARCODE        CONTRAST     ASV          PHYLUM
## 1  corn flowering     16S ROTATION_LENGTH ASV_204  Proteobacteria
## 2  corn flowering     16S ROTATION_LENGTH ASV_222  Actinobacteria
## 3  corn flowering     16S ROTATION_LENGTH ASV_479  Actinobacteria
## 4  corn flowering     16S ROTATION_LENGTH ASV_136  Actinobacteria
## 5  corn flowering     16S ROTATION_LENGTH ASV_460 Verrucomicrobia
## 6  corn flowering     16S ROTATION_LENGTH ASV_307  Actinobacteria
## 7  corn flowering     16S ROTATION_LENGTH ASV_238  Proteobacteria
## 8  corn flowering     16S ROTATION_LENGTH ASV_284  Proteobacteria
## 9  corn flowering     16S ROTATION_LENGTH ASV_236  Proteobacteria
## 10 corn flowering     16S ROTATION_LENGTH ASV_183 Verrucomicrobia
## 11 corn flowering     16S ROTATION_LENGTH ASV_149  Actinobacteria
## 12 corn flowering     16S ROTATION_LENGTH ASV_417  Actinobacteria
## 13 corn flowering     16S ROTATION_LENGTH ASV_103  Actinobacteria
## 14 corn flowering     16S ROTATION_LENGTH  ASV_86  Actinobacteria
## 15 corn flowering     16S ROTATION_LENGTH  ASV_63 Verrucomicrobia
## 16 corn flowering     16S ROTATION_LENGTH  ASV_79  Actinobacteria
## 17 corn flowering     16S ROTATION_LENGTH ASV_122  Actinobacteria
## 18 corn flowering     16S ROTATION_LENGTH ASV_113 Verrucomicrobia
## 19 corn flowering     16S ROTATION_LENGTH  ASV_74 Verrucomicrobia
## 20 corn flowering     16S ROTATION_LENGTH ASV_242 Verrucomicrobia
## 21 corn flowering     16S ROTATION_LENGTH ASV_384 Verrucomicrobia
## 22 corn flowering     16S ROTATION_LENGTH ASV_115  Actinobacteria
##                                                           TAXA GUILD       AXIS
## 1  Burkholderiaceae Burkholderia-Caballeronia-Paraburkholderia    NA  0.2692900
## 2                                    Nocardioidaceae Kribbella    NA  0.2146798
## 3                             Pseudonocardiaceae Amycolatopsis    NA  0.1985530
## 4                            Solirubrobacteraceae Conexibacter    NA  0.1826547
## 5                                 Rubritaleaceae Luteolibacter    NA  0.1593833
## 6                             Intrasporangiaceae Lapillicoccus    NA  0.1587951
## 7                                  Burkholderiaceae Variovorax    NA  0.1442220
## 8  Burkholderiaceae Burkholderia-Caballeronia-Paraburkholderia    NA  0.1273779
## 9                                  Burkholderiaceae Variovorax    NA  0.1240912
## 10                  Chthoniobacteraceae Candidatus_Udaeobacter    NA  0.1156731
## 11                              Cellulomonadaceae Cellulomonas    NA  0.1111435
## 12                           Propionibacteriaceae Microlunatus    NA  0.1097302
## 13                                         Gaiellaceae Gaiella    NA  0.1052552
## 14                              Streptomycetaceae Streptomyces    NA  0.1029040
## 15                  Chthoniobacteraceae Candidatus_Udaeobacter    NA -0.1122029
## 16                            Micrococcaceae Pseudarthrobacter    NA -0.1194276
## 17                            Intrasporangiaceae Lapillicoccus    NA -0.1325373
## 18                  Chthoniobacteraceae Candidatus_Udaeobacter    NA -0.1406555
## 19                  Chthoniobacteraceae Candidatus_Udaeobacter    NA -0.1476605
## 20                  Chthoniobacteraceae Candidatus_Udaeobacter    NA -0.1658830
## 21                  Chthoniobacteraceae Candidatus_Udaeobacter    NA -0.1736827
## 22                           Propionibacteriaceae Microlunatus    NA -0.1872172
##           RSQ     F_OBS  P_VAL     P_ADJ
## 1  0.26980693 13.857270 0.0026 0.1924000
## 2  0.21639662  9.707878 0.0039 0.2405000
## 3  0.39011518 22.641996 0.0004 0.0740000
## 4  0.08309827  3.366535 0.0939 0.8447170
## 5  0.05676092  2.113783 0.1613 0.8447170
## 6  0.40386249 23.715336 0.0003 0.0740000
## 7  0.09121844  3.840779 0.0433 0.8447170
## 8  0.19690625  9.276857 0.0081 0.3746250
## 9  0.12244231  5.086227 0.0300 0.8411333
## 10 0.06752865  2.744123 0.1167 0.8447170
## 11 0.08181198  3.742637 0.0703 0.8447170
## 12 0.05304831  2.381707 0.1106 0.8447170
## 13 0.03039700  1.106445 0.2962 0.8447170
## 14 0.25836391 13.093155 0.0017 0.1572500
## 15 0.01402708  0.498419 0.4695 0.8447170
## 16 0.05473083  2.317797 0.1466 0.8447170
## 17 0.08140144  4.228837 0.0529 0.8447170
## 18 0.10437767  4.200901 0.0500 0.8447170
## 19 0.06641153  2.625184 0.0989 0.8447170
## 20 0.03452340  1.326884 0.2481 0.8447170
## 21 0.04981441  1.837217 0.1809 0.8447170
## 22 0.13573839  5.526551 0.0303 0.8411333

Combined - Update manually

Full model predicting yield based on above results.

Approach:

  1. Combine dataframes with yield
  2. Compare the no-microbe model with the full-microbe model
sem_name = 'corn_combined'

models = c('corn_seedling_FOLLOW_SOY', 
           'corn_flowering_ROTATION_LENGTH')



plot_sub = c('SAMPLE_ID', 'PLOT', 'YEAR', 'CROP', 'CROPYIELD') %>% 
  plot[, .] %>% 
  subset(!is.na(CROPYIELD))
plot_sub$SID = with(plot_sub, paste(CROP, YEAR, PLOT, sep='_'))

seed = sem_ls[[models[1]]]$df %>% 
  extract(c('SAMPLE_ID', 'AXIS_ITS', 'FOLLOW_SOY'))
colnames(seed)[c(2,3)] %<>% paste('T1', ., sep='_')
seed$SID = seed$SAMPLE_ID %>% 
  as.character %>% 
  strsplit('_') %>% 
  lapply(extract, 1:3) %>% 
  lapply(paste, collapse='_')
seed$SAMPLE_ID = NULL


flow = sem_ls[[models[2]]]$df %>% 
  extract(c('SID', 'AXIS_16S', 'ROTATION_LENGTH', 'AV_DW', 'T1_AV_DW'))
colnames(flow)[c(2:4)] %<>% paste('T2', ., sep='_')

mod_df = merge(plot_sub, seed, by='SID') %>% 
  merge(flow, by='SID')

is_numeric = sapply(mod_df, is.numeric)
mod_df[is_numeric] %<>% scale

CHECKHERE

mod_df %<>% na.omit
no_microbe = {
  "
  CROPYIELD ~ YEAR + T2_AV_DW + T2_ROTATION_LENGTH
  T2_AV_DW ~ YEAR + T2_ROTATION_LENGTH + T1_AV_DW
  T1_AV_DW ~ YEAR + T1_FOLLOW_SOY
  
  T2_AXIS_16S ~ T2_ROTATION_LENGTH
  T1_AXIS_ITS ~ T1_FOLLOW_SOY
  "
} %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_microbe = {
  "
  CROPYIELD ~ YEAR + a*T2_AV_DW + T2_ROTATION_LENGTH
  T2_AV_DW ~ YEAR + b*T1_AV_DW + c*T2_AXIS_16S 
  T1_AV_DW ~ YEAR + d*T1_AXIS_ITS
  
  T2_AXIS_16S ~ T2_ROTATION_LENGTH
  T1_AXIS_ITS ~ T1_FOLLOW_SOY
  
  # microbe paths
  t1_microbe := a*b*d
  t2_microbe := a*c
  "
} %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

biomass_only = {
  "
  CROPYIELD ~ YEAR + T2_AV_DW
  T2_AV_DW ~ YEAR + T1_AV_DW
  "
} %>%
  sem(mod_df, se=sem_error, std.ov=TRUE, missing='ML')

model_compares = anova(no_microbe, full_microbe) %T>% print
## Chi-Squared Difference Test
## 
##              Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## no_microbe   12 316.39 354.06 34.262                              
## full_microbe 15 301.56 334.31 25.428    -8.8337       3          1
keep_mod = which.min(model_compares$AIC) %>% 
  rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_microbe = no_microbe,
                          full_microbe = full_microbe,
                          df = mod_df)

full_microbe is the best, by -14.83 AIC units.

lavaanPlot(model=get(keep_mod), coefs=TRUE, stand=TRUE)
summary(full_microbe, std=TRUE)
## lavaan 0.6-5 ended normally after 44 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         20
##                                                       
##   Number of observations                            38
##   Number of missing patterns                         1
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                25.428
##   Degrees of freedom                                15
##   P-value (Chi-square)                           0.044
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   CROPYIELD ~                                                           
##     YEAR              2.385    0.176   13.571    0.000    2.385    1.221
##     T2_AV_DW   (a)    0.401    0.110    3.631    0.000    0.401    0.404
##     T2_ROTATIO       -0.154    0.070   -2.214    0.027   -0.154   -0.156
##   T2_AV_DW ~                                                            
##     YEAR             -0.922    0.331   -2.783    0.005   -0.922   -0.469
##     T1_AV_DW   (b)    0.369    0.204    1.814    0.070    0.369    0.372
##     T2_AXIS_16 (c)   -0.281    0.086   -3.270    0.001   -0.281   -0.282
##   T1_AV_DW ~                                                            
##     YEAR             -1.677    0.149  -11.238    0.000   -1.677   -0.845
##     T1_AXIS_IT (d)   -0.346    0.108   -3.204    0.001   -0.346   -0.344
##   T2_AXIS_16S ~                                                         
##     T2_ROTATIO        0.786    0.168    4.676    0.000    0.786    0.786
##   T1_AXIS_ITS ~                                                         
##     T1_FOLLOW_        0.690    0.116    5.974    0.000    0.690    0.690
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .CROPYIELD        -3.641    0.253  -14.409    0.000   -3.641   -3.733
##    .T2_AV_DW          1.408    0.475    2.966    0.003    1.408    1.432
##    .T1_AV_DW          2.559    0.275    9.316    0.000    2.559    2.582
##    .T2_AXIS_16S       0.000    0.100    0.000    1.000    0.000    0.000
##    .T1_AXIS_ITS       0.000    0.117    0.000    1.000    0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .CROPYIELD         0.081    0.017    4.650    0.000    0.081    0.085
##    .T2_AV_DW          0.247    0.061    4.064    0.000    0.247    0.256
##    .T1_AV_DW          0.177    0.055    3.242    0.001    0.177    0.180
##    .T2_AXIS_16S       0.372    0.091    4.102    0.000    0.372    0.382
##    .T1_AXIS_ITS       0.510    0.094    5.423    0.000    0.510    0.523
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     t1_microbe       -0.051    0.027   -1.926    0.054   -0.051   -0.052
##     t2_microbe       -0.113    0.051   -2.185    0.029   -0.113   -0.114

Soy

Continuing on to soybean…

As with univariate tests, we’ll first test whether our response, the microbial community, responds to the overall treatments. If so, we will test the contrasts that were also predictive of plant health (biomass). The dataset will be reduced to those taxa responding to the contrast. Output will be the same as for corn.

First we see if these communities respond to the treatment.

Seedling

Rotation

Bacteria

crop = 'soy'
sampling = 'seedling'
constraint = 'ROTATION'
partial = 'YEAR'
barcode = '16S'
contrast = 'None'

nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name) 

perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
##   CROP SAMPLING BARCODE      RSQ    F_OBS   Df  P_VAL
## 1  soy seedling     16S 0.109189 1.137291 5,33 0.1555
ord_ls[[nn]] = perms  # structure includes: permutation results, observed data (pcwOrd object).

ROTATION explains 10.9 of soy 16S community variation at seedling (p = 0.16). Therefore, we will test not contrasts with the 16S dataset.

Fungi

barcode = 'ITS'

nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name) 

perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
##   CROP SAMPLING BARCODE       RSQ    F_OBS   Df P_VAL
## 1  soy seedling     ITS 0.2026763 2.315203 5,33 1e-04
ord_ls[[nn]] = perms  # structure includes: permutation results, observed data (pcwOrd object).

ROTATION explains 20.3 of soy ITS community variation at seedling (p = 10^{-4}). Therefore, we will test contrasts with the ITS dataset.

Contrasts

Follow Corn

Bacteria

For compatibility with downstream code

contrast = 'FOLLOW_CORN'
barcode = '16S'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)

The contrast has a negative score, so we will invert it.

contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP SAMPLING BARCODE    CONTRAST       RSQ    F_OBS   Df  P_VAL
## 1  soy seedling     16S FOLLOW_CORN 0.0269337 1.108991 2,36 0.3428
ord_ls[[nn]] = perms

16S communities did not respond to ROTATION at seedling (p = 0.34), as expected based on full rotation/treatment results. We will not test it further or reduce features. We’ll just save the scores for plotting.

Fungi
contrast = 'FOLLOW_CORN'
barcode = 'ITS'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)

The contrast has a negative score, so we will invert it.

contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP SAMPLING BARCODE    CONTRAST        RSQ    F_OBS   Df P_VAL
## 1  soy seedling     ITS FOLLOW_CORN 0.07422715 3.141276 2,36 2e-04
ord_ls[[nn]] = perms

ITS communities did respond to ROTATION at seedling (p = 210^{-4}). We will see if these communities predict soy biomass.

Predictive Communities

Here are the axis scores so far. Note we will NOT include 16S in the model.

find_ords = c(crop, sampling, contrast) %>% 
  sapply(grepl, names(ord_ls)) %>% 
  apply(1, all)
local_ords = ord_ls[find_ords] %>% 
  lapply(extract2, 'observed')

sem_name = paste(crop, sampling, contrast, sep='_')

#' combine axis (standard) scores plus response, contrast, and partial data into a single data.frame for modeling.
#' contrast_row is the row of contrasts[[crop]]. Must be a matrix.
mod_df = generate_prediction_df(local_ords, 
                                crop, 
                                sampling, 
                                contrasts[[crop]][contrast,,drop=FALSE], 
                                partial, 
                                constraint, 
                                response, 
                                plot,
                                auto_invert=TRUE)  # if contrast and axis are negatively correlated, for consistency.
head(mod_df)

Models are as previously described.

mod_df %<>% na.omit
no_mediation = {
  "
  # Model
  AV_DW ~ YEAR + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  AXIS_ITS ~~ 0*AV_DW
  # Calculate pathways for easy interpretation
  direct := c
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  
  direct := c
  total := c + (a*b)
  indirect := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS# + c*CONTRAST
  AXIS_ITS ~ a*CONTRAST
  
  # direct := c
  # total := c + (a*b)
  indirect := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

model_compares = anova(no_mediation, partial_mediation_ITS, full_mediation_ITS) %T>%
  print
## Chi-Squared Difference Test
## 
##                       Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)  
## partial_mediation_ITS  1 166.61 179.92 0.2246                                
## no_mediation           2 168.07 179.71 3.6808     3.4562       1    0.06302 .
## full_mediation_ITS     2 164.99 176.63 0.6025    -3.0783       0             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
keep_mod = which.min(model_compares$AIC) %>% 
  rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation, 
                          partial_mediation_ITS = partial_mediation_ITS, 
                          full_mediation_ITS = full_mediation_ITS, 
                          df = mod_df)

The best model is full_mediation_ITS.

mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          7
##                                                       
##   Number of observations                            39
##   Number of missing patterns                         1
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.602
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.740
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   AV_DW ~                                                               
##     YEAR             -1.165    0.216   -5.392    0.000   -1.165   -0.597
##     AXIS_ITS   (b)   -0.423    0.117   -3.621    0.000   -0.423   -0.428
##   AXIS_ITS ~                                                            
##     FOLLOW_COR (a)    1.750    0.228    7.693    0.000    1.750    0.790
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             1.763    0.365    4.832    0.000    1.763    1.807
##    .AXIS_ITS         -0.022    0.100   -0.225    0.822   -0.022   -0.023
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             0.450    0.081    5.548    0.000    0.450    0.472
##    .AXIS_ITS          0.366    0.098    3.726    0.000    0.366    0.376
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect         -0.740    0.232   -3.185    0.001   -0.740   -0.338

This is an excellent model.

lavaanPlot(model=get(keep_mod), coefs=TRUE)
Plot
select_barcode = 'ITS'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_effect(select_barcode, response, partial, constraint, contrast, mod_df, partial_response=TRUE, nn)

Candidate Taxa

The taxa that load most credibly onto this axis are:

select_barcode = 'ITS'

is_ITS = select_barcode=='ITS'
if (is_ITS) {
  depth = 'Species'
  guilds = funguild
} else {
  depth = 'Genus'
  guilds = NULL
}

ord_obj = paste(crop, sampling, select_barcode, contrast, sep='_') %>% 
  ord_ls[[.]]

top_tax = id_top_tax(ord_obj,
                     phylo_obj[[select_barcode]], 
                     metric='feature_F',
                     phylo_depth=depth, 
                     max_p_adj=1,
                     max_p_val=1,
                     guilds=funguild,
                     auto_invert=TRUE) %>% 
  subset(pcAxis1^2 > 0.01)
tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
taxa_results %<>% update_taxa_results(crop, sampling, select_barcode, contrast, top_tax[tt_sort, ])
##    CROP SAMPLING BARCODE    CONTRAST     ASV        PHYLUM
## 1   soy seedling     ITS FOLLOW_CORN  ASV_32    Ascomycota
## 2   soy seedling     ITS FOLLOW_CORN  ASV_25    Ascomycota
## 3   soy seedling     ITS FOLLOW_CORN  ASV_64 Basidiomycota
## 4   soy seedling     ITS FOLLOW_CORN  ASV_14    Ascomycota
## 5   soy seedling     ITS FOLLOW_CORN  ASV_23    Ascomycota
## 6   soy seedling     ITS FOLLOW_CORN  ASV_65    Ascomycota
## 7   soy seedling     ITS FOLLOW_CORN ASV_259    Ascomycota
## 8   soy seedling     ITS FOLLOW_CORN  ASV_58    Ascomycota
## 9   soy seedling     ITS FOLLOW_CORN  ASV_71    Ascomycota
## 10  soy seedling     ITS FOLLOW_CORN  ASV_29    Ascomycota
## 11  soy seedling     ITS FOLLOW_CORN  ASV_88    Ascomycota
## 12  soy seedling     ITS FOLLOW_CORN  ASV_52    Ascomycota
## 13  soy seedling     ITS FOLLOW_CORN ASV_279    Ascomycota
## 14  soy seedling     ITS FOLLOW_CORN ASV_278    Ascomycota
## 15  soy seedling     ITS FOLLOW_CORN  ASV_46    Ascomycota
##                          TAXA
## 1        Unknown Cladosporium
## 2  Dactylonectria macrodidyma
## 3             Ustilago maydis
## 4          Unknown Alternaria
## 5       Unknown Didymellaceae
## 6            Unknown Branch06
## 7        Unknown Neosetophoma
## 8       Penicillium decumbens
## 9   Unknown Paraphaeosphaeria
## 10    Mycosphaerella tassiana
## 11           Unknown Branch06
## 12     Corynespora cassiicola
## 13    Thelonectria rubrococca
## 14     Piniphoma wesendahlina
## 15           Unknown Branch06
##                                                                       GUILD
## 1  Animal Pathogen-Endophyte-Lichen Parasite-Plant Pathogen-Wood Saprotroph
## 2                                                                         -
## 3                                                            Plant Pathogen
## 4                  Animal Pathogen-Endophyte-Plant Pathogen-Wood Saprotroph
## 5                                                                         -
## 6                                                                         -
## 7                                                      Undefined Saprotroph
## 8                      Dung Saprotroph-Undefined Saprotroph-Wood Saprotroph
## 9                                                      Undefined Saprotroph
## 10                                                           Plant Pathogen
## 11                                                                        -
## 12                                                     Undefined Saprotroph
## 13                                                          Leaf Saprotroph
## 14                                                                        -
## 15                                                                        -
##          AXIS        RSQ      F_OBS  P_VAL     P_ADJ
## 1   0.3926261 0.22895505 12.4680595 0.0017 0.1472200
## 2   0.2270544 0.08400267  3.4219543 0.0608 0.4323127
## 3   0.1973184 0.14163798  6.5200039 0.0181 0.4299791
## 4   0.1777951 0.09546936  4.0386463 0.0523 0.4323127
## 5   0.1605666 0.09644298  3.9113190 0.0584 0.4323127
## 6   0.1587745 0.09614298  5.3484036 0.0300 0.4299791
## 7   0.1516496 0.20874898 10.1400855 0.0043 0.2068778
## 8  -0.1061957 0.02378775  0.8844087 0.3515 0.5896345
## 9  -0.1148074 0.15980403  6.9701302 0.0081 0.3188455
## 10 -0.1362238 0.27602223 14.2143370 0.0004 0.0433000
## 11 -0.1567958 0.08153767  3.2051756 0.0927 0.5078835
## 12 -0.1869930 0.06029506  2.8889799 0.0629 0.4323127
## 13 -0.2435872 0.06363983  2.7724728 0.0961 0.5078835
## 14 -0.2801628 0.36458901 20.7461134 0.0002 0.0433000
## 15 -0.4811718 0.23010605 10.7615763 0.0024 0.1670143

Quite the mix of taxa here.

Flowering

Rotation

Bacteria

crop = 'soy'
sampling = 'flowering'
constraint = 'ROTATION'
partial = 'YEAR'
barcode = '16S'
contrast = 'None'

nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name) 

perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
##   CROP  SAMPLING BARCODE       RSQ    F_OBS   Df  P_VAL
## 1  soy flowering     16S 0.1002728 1.118494 5,34 0.2338
ord_ls[[nn]] = perms  # structure includes: permutation results, observed data (pcwOrd object).

ROTATION explains 10 of soy 16S community variation at flowering (p = 0.23). Therefore, we will not test contrasts with the 16S dataset.

Fungi

barcode = 'ITS'

nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name) 

perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
##   CROP  SAMPLING BARCODE       RSQ    F_OBS   Df  P_VAL
## 1  soy flowering     ITS 0.1546665 1.623074 5,33 0.0119
ord_ls[[nn]] = perms  # structure includes: permutation results, observed data (pcwOrd object).

ROTATION explains 15.5 of soy ITS community variation at flowering (p = 0.012). Therefore, we will test contrasts with the ITS dataset.

Contrasts

Follow Corn

Bacteria

For compatibility with downstream code

contrast = 'FOLLOW_CORN'
barcode = '16S'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP  SAMPLING BARCODE    CONTRAST        RSQ     F_OBS   Df P_VAL
## 1  soy flowering     16S FOLLOW_CORN 0.01984165 0.8714307 2,37 0.794
ord_ls[[nn]] = perms

16S communities did not respond to ROTATION at flowering (p = 0.79), as expected based on full rotation/treatment results. We will not test it further or reduce features. We’ll just save the scores for plotting.

Fungi
barcode = 'ITS'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)

The contrast has a negative score, so we will invert it.

contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP  SAMPLING BARCODE    CONTRAST       RSQ    F_OBS   Df  P_VAL
## 1  soy flowering     ITS FOLLOW_CORN 0.0698402 2.886661 2,36 0.0043
ord_ls[[nn]] = perms

ITS communities did respond to ROTATION at flowering (p = 0.0043). We will see if these communities predict soy biomass.

Predictive Communities

Here are the axis scores so far. Note we will NOT include 16S in the model.

find_ords = c(crop, sampling, contrast) %>% 
  sapply(grepl, names(ord_ls)) %>% 
  apply(1, all)
local_ords = ord_ls[find_ords] %>% 
  lapply(extract2, 'observed')

sem_name = paste(crop, sampling, contrast, sep='_')
mod_df = generate_prediction_df(local_ords, 
                                crop, 
                                sampling, 
                                contrasts[[crop]][contrast,,drop=FALSE], 
                                partial, 
                                constraint, 
                                response, 
                                plot,
                                auto_invert=TRUE)  # if contrast and axis are negatively correlated, for consistency.
# Add T1 biomass
seed_bio = subset(plot, SAMPLING=='seedling' & CROP==crop, select=c('SAMPLE_ID', 'AV_DW'))
colnames(seed_bio)[2] = 'T1_AV_DW'
seed_bio$SAMPLE_ID %<>% as.character %>% 
  strsplit('_') %>% 
  sapply(function(x) paste(x[1:3], collapse='_'))
mod_df$SID = mod_df$SAMPLE_ID %>% 
  as.character %>% 
  strsplit('_') %>% 
  sapply(function(x) paste(x[1:3], collapse='_'))

mod_df %<>% merge(seed_bio, by.x='SID', by.y='SAMPLE_ID')

head(mod_df)

The SEM models are as described above.

mod_df %<>% na.omit
no_mediation = {
  "
  # Model
  AV_DW ~ YEAR + c*CONTRAST + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
  
  # covariances
  AXIS_ITS ~~ 0*AV_DW
  
  # Calculate pathways for easy interpretation
  direct := c
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
    
  direct := c
  total := c + (a*b)
  indirect_its := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST

  # covariances
  indirect_its := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

model_compares = anova(no_mediation, partial_mediation_ITS, full_mediation_ITS) %T>%
  print
## Chi-Squared Difference Test
## 
##                       Df    AIC    BIC   Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_ITS  2 173.55 188.29  7.1056                              
## no_mediation           3 171.60 184.70  7.1573     0.0517       1     0.8201
## full_mediation_ITS     3 183.16 196.26 18.7119    11.5546       0
keep_mod = which.min(model_compares$AIC) %>% 
  rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation, 
                          partial_mediation_ITS = partial_mediation_ITS, 
                          full_mediation_ITS = full_mediation_ITS, 
                          df = mod_df)

The best model is no_mediation.

mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 25 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          8
##                                                       
##   Number of observations                            38
##   Number of missing patterns                         1
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 7.157
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.067
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   AV_DW ~                                                               
##     YEAR              1.314    0.287    4.580    0.000    1.314    0.666
##     FOLLOW_COR (c)   -1.023    0.248   -4.133    0.000   -1.023   -0.468
##     T1_AV_DW          2.993    1.461    2.048    0.041    2.993    0.288
##   AXIS_ITS ~                                                            
##     FOLLOW_COR (a)    1.412    0.286    4.933    0.000    1.412    0.646
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .AV_DW ~~                                                              
##    .AXIS_ITS          0.000                               0.000    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW            -3.222    0.974   -3.309    0.001   -3.222   -3.265
##    .AXIS_ITS         -0.019    0.123   -0.151    0.880   -0.019   -0.019
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             0.363    0.101    3.593    0.000    0.363    0.373
##    .AXIS_ITS          0.567    0.106    5.376    0.000    0.567    0.582
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     direct           -1.023    0.248   -4.131    0.000   -1.023   -0.468

This is an poor model.

lavaanPlot(model=get(keep_mod), coefs=TRUE)

Fungi did not mediate the effect of FOLLOW_CORN on soy biomass at flowering.

Rotation Length

Bacteria

For compatibility with downstream code

contrast = 'ROTATION_LENGTH'
barcode = '16S'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)

The contrast has a negative score, so we will invert it.

contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP  SAMPLING BARCODE        CONTRAST        RSQ    F_OBS   Df P_VAL
## 1  soy flowering     16S ROTATION_LENGTH 0.02291875 1.010265 2,37 0.529
ord_ls[[nn]] = perms

16S communities did not respond to ROTATION at flowering (p = 0.53), as expected based on full rotation/treatment results. We will not test it further or reduce features. We’ll just save the scores for plotting.

Fungi
barcode = 'ITS'

contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')

nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr) 

perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)

The contrast has a negative score, so we will invert it.

contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
##   CROP  SAMPLING BARCODE        CONTRAST        RSQ    F_OBS   Df  P_VAL
## 1  soy flowering     ITS ROTATION_LENGTH 0.03081713 1.219124 2,36 0.2191
ord_ls[[nn]] = perms

ITS communities did not respond to ROTATION at flowering (p = 0.22). We will see if these communities predict soy biomass.

Predictive Communities

Here are the axis scores so far. Note we will NOT include 16S in the model.

find_ords = c(crop, sampling, contrast) %>% 
  sapply(grepl, names(ord_ls)) %>% 
  apply(1, all)
local_ords = ord_ls[find_ords] %>% 
  lapply(extract2, 'observed')

sem_name = paste(crop, sampling, contrast, sep='_')
mod_df = generate_prediction_df(local_ords, 
                                crop, 
                                sampling, 
                                contrasts[[crop]][contrast,,drop=FALSE], 
                                partial, 
                                constraint, 
                                response, 
                                plot,
                                auto_invert=TRUE)  # if contrast and axis are negatively correlated, for consistency.
# Add T1 biomass
seed_bio = subset(plot, SAMPLING=='seedling' & CROP==crop, select=c('SAMPLE_ID', 'AV_DW'))
colnames(seed_bio)[2] = 'T1_AV_DW'
seed_bio$SAMPLE_ID %<>% as.character %>% 
  strsplit('_') %>% 
  sapply(function(x) paste(x[1:3], collapse='_'))
mod_df$SID = mod_df$SAMPLE_ID %>% 
  as.character %>% 
  strsplit('_') %>% 
  sapply(function(x) paste(x[1:3], collapse='_'))

mod_df %<>% merge(seed_bio, by.x='SID', by.y='SAMPLE_ID')

head(mod_df)

The SEM models are as described above.

mod_df %<>% na.omit
no_mediation = {
  "
  # Model
  AV_DW ~ YEAR + c*CONTRAST + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
  
  # covariances
  AXIS_ITS ~~ 0*AV_DW
  
  # Calculate pathways for easy interpretation
  direct := c
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

partial_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST
    
  direct := c
  total := c + (a*b)
  indirect_its := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_mediation_ITS = {
  "
  AV_DW ~ YEAR + b*AXIS_ITS + T1_AV_DW
  AXIS_ITS ~ a*CONTRAST

  # covariances
  indirect_its := a*b
  "
} %>% 
  gsub('CONTRAST', contrast, .) %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

model_compares = anova(no_mediation, partial_mediation_ITS, full_mediation_ITS) %T>%
  print
## Chi-Squared Difference Test
## 
##                       Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_ITS  2 181.79 196.53 4.6383                              
## no_mediation           3 180.31 193.41 5.1596    0.52135       1     0.4703
## full_mediation_ITS     3 181.43 194.53 6.2767    1.11705       0
keep_mod = which.min(model_compares$AIC) %>% 
  rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation, 
                          partial_mediation_ITS = partial_mediation_ITS, 
                          full_mediation_ITS = full_mediation_ITS, 
                          df = mod_df)

The best model is no_mediation.

mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 28 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          8
##                                                       
##   Number of observations                            38
##   Number of missing patterns                         1
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 5.160
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.160
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   AV_DW ~                                                               
##     YEAR              1.573    0.349    4.510    0.000    1.573    0.797
##     ROTATION_L (c)   -0.531    0.167   -3.183    0.001   -0.531   -0.261
##     T1_AV_DW          5.305    1.467    3.617    0.000    5.305    0.510
##   AXIS_ITS ~                                                            
##     ROTATION_L (a)    1.332    0.212    6.281    0.000    1.332    0.654
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .AV_DW ~~                                                              
##    .AXIS_ITS          0.000                               0.000    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW            -4.611    1.066   -4.326    0.000   -4.611   -4.672
##    .AXIS_ITS          0.026    0.121    0.217    0.828    0.026    0.027
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .AV_DW             0.464    0.132    3.528    0.000    0.464    0.477
##    .AXIS_ITS          0.557    0.161    3.459    0.001    0.557    0.572
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     direct           -0.531    0.167   -3.181    0.001   -0.531   -0.261

This is a good model.

lavaanPlot(model=get(keep_mod), coefs=TRUE)

Fungi did not mediate the effect of ROTATION_LENGTH on soy biomass at flowering.

Combined - Update manually

Full model predicting yield based on above results.

Approach:

  1. Combine dataframes with yield
  2. Compare the no-microbe model with the full-microbe model
sem_name = 'soy_combined'

models = c('soy_seedling_FOLLOW_CORN',
           'soy_flowering_ROTATION_LENGTH')

plot_sub = c('SAMPLE_ID', 'PLOT', 'YEAR', 'CROP', 'CROPYIELD') %>% 
  plot[, .] %>% 
  subset(!is.na(CROPYIELD))
plot_sub$SID = with(plot_sub, paste(CROP, YEAR, PLOT, sep='_'))

seed = sem_ls[[models[1]]]$df %>% 
  extract(c('SAMPLE_ID', 'AXIS_ITS', 'FOLLOW_CORN'))
colnames(seed)[c(2,3)] %<>% paste('T1', ., sep='_')
seed$SID = seed$SAMPLE_ID %>% 
  as.character %>% 
  strsplit('_') %>% 
  lapply(extract, 1:3) %>% 
  lapply(paste, collapse='_')
seed$SAMPLE_ID = NULL


flow = sem_ls[[models[2]]]$df %>% 
  extract(c('SID', 'ROTATION_LENGTH', 'AV_DW', 'T1_AV_DW'))
colnames(flow)[c(2, 3)] %<>% paste('T2', ., sep='_')

mod_df = merge(plot_sub, seed, by='SID') %>% 
  merge(flow, by='SID')
mod_df %<>% na.omit
no_microbe = {
  "
  CROPYIELD ~ YEAR + T2_AV_DW# + T2_ROTATION_LENGTH + T1_FOLLOW_CORN
  T2_AV_DW ~ YEAR + T2_ROTATION_LENGTH + T1_AV_DW + T1_FOLLOW_CORN
  T1_AV_DW ~ YEAR + T1_FOLLOW_CORN
  
  T1_AXIS_ITS ~ T1_FOLLOW_CORN
  "
} %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

full_microbe = {
  "
  CROPYIELD ~ YEAR + a*T2_AV_DW# + T2_ROTATION_LENGTH + T1_FOLLOW_CORN
  T2_AV_DW ~ YEAR + T2_ROTATION_LENGTH + b*T1_AV_DW + T1_FOLLOW_CORN
  T1_AV_DW ~ YEAR + d*T1_AXIS_ITS
  
  T1_AXIS_ITS ~ T1_FOLLOW_CORN
  
  # microbe paths
  t1_microbe := a*b*d
  "
} %>% 
  sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')

model_compares = anova(no_microbe, full_microbe) %T>% print
## Chi-Squared Difference Test
## 
##              Df    AIC    BIC   Chisq Chisq diff Df diff Pr(>Chisq)
## no_microbe    8 270.82 300.30 12.2709                              
## full_microbe  9 264.94 292.78  8.3893    -3.8815       1          1
keep_mod = which.min(model_compares$AIC) %>% 
  rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_microbe = no_microbe,
                          full_microbe = full_microbe,
                          df = mod_df)

full_microbe is the best, by -5.882 AIC units.

lavaanPlot(model=get(keep_mod), coefs=TRUE, stand=TRUE)
summary(full_microbe, std=TRUE)
## lavaan 0.6-5 ended normally after 33 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         17
##                                                       
##   Number of observations                            38
##   Number of missing patterns                         1
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 8.389
##   Degrees of freedom                                 9
##   P-value (Chi-square)                           0.495
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   CROPYIELD ~                                                           
##     YEAR              1.230    0.129    9.546    0.000    1.230    0.624
##     T2_AV_DW   (a)    0.451    0.082    5.526    0.000    0.451    0.446
##   T2_AV_DW ~                                                            
##     YEAR              1.163    0.257    4.520    0.000    1.163    0.595
##     T2_ROTATIO       -0.634    0.155   -4.080    0.000   -0.634   -0.315
##     T1_AV_DW   (b)    0.198    0.128    1.546    0.122    0.198    0.199
##     T1_FOLLOW_       -1.108    0.239   -4.634    0.000   -1.108   -0.512
##   T1_AV_DW ~                                                            
##     YEAR             -1.193    0.210   -5.671    0.000   -1.193   -0.609
##     T1_AXIS_IT (d)   -0.435    0.114   -3.827    0.000   -0.435   -0.439
##   T1_AXIS_ITS ~                                                         
##     T1_FOLLOW_        1.738    0.224    7.759    0.000    1.738    0.795
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .CROPYIELD        -1.845    0.174  -10.596    0.000   -1.845   -1.872
##    .T2_AV_DW         -1.742    0.385   -4.524    0.000   -1.742   -1.784
##    .T1_AV_DW          1.790    0.358    5.006    0.000    1.790    1.828
##    .T1_AXIS_ITS      -0.023    0.100   -0.230    0.818   -0.023   -0.023
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .CROPYIELD         0.122    0.034    3.599    0.000    0.122    0.126
##    .T2_AV_DW          0.273    0.081    3.365    0.001    0.273    0.286
##    .T1_AV_DW          0.430    0.082    5.235    0.000    0.430    0.449
##    .T1_AXIS_ITS       0.358    0.101    3.551    0.000    0.358    0.368
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     t1_microbe       -0.039    0.028   -1.368    0.171   -0.039   -0.039

Overall

In both corn and soybean, we see the following pattern:

At the seedling stage, the effect of the previous crop is strong. Soybean growing after corn, and corn growing after soybean, has lowerbiomass than if growing after alternate crops. The fungal community fully explains the previous crop effect. The fungal taxa that are encouraged by corn/soybean as the previous crop tend to be pathogens, suggesting they are suppressing the current crop’s growth.

At the flowering stage, the effect of rotation length dominates, with crops growing in long rotations having higher biomass. The microbial community again fully explains the rotation length effect, at least for corn - bacteria is slightly better than fungi in the case of corn. Soybean show mild evidence of the fungal community partially mediation rotation length (slightly lower χ2), but this is not as parsimonious as the no-mediation model.

Export Results

To save: 1. Ordination data (with permutations) 2. SEM models and data.frames 3. list of predictive taxa

# print('update')
out_obj = 'Microbes as Mechanism Results.Rdata'

save(full_ord_results, contrast_ord_results, taxa_results, ord_ls, sem_ls,
     file=here('Data', out_obj))